news story
Learning to Interpret Weight Differences in Language Models
Goel, Avichal, Kim, Yoon, Shavit, Nir, Wang, Tony T.
Finetuning (pretrained) language models is a standard approach for updating their internal parametric knowledge and specializing them to new tasks and domains. However, the corresponding model weight changes ("weight diffs") are not generally interpretable. While inspecting the finetuning dataset can give a sense of how the model might have changed, these datasets are often not publicly available or are too large to work with directly. Towards the goal of comprehensively understanding weight diffs in natural language, we introduce Diff Interpretation Tuning (DIT), a method that trains models to describe their own finetuning-induced modifications. Our approach uses synthetic, labeled weight diffs to train a DIT-adapter, which can be applied to a compatible finetuned model to make it describe how it has changed. We demonstrate in two proof-of-concept settings (reporting hidden behaviors and summarizing finetuned knowledge) that our method enables models to describe their finetuning-induced modifications using accurate natural language descriptions.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (5 more...)
- Media > Music (0.46)
- Leisure & Entertainment > Sports (0.46)
- Leisure & Entertainment > Games (0.46)
Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings
Hanley, Hans W. A., Durumeric, Zakir
Contextual large language model embeddings are increasingly utilized for topic modeling and clustering. However, current methods often scale poorly, rely on opaque similarity metrics, and struggle in multilingual settings. In this work, we present a novel, scalable, interpretable, hierarchical, and multilingual approach to clustering news articles and social media data. To do this, we first train multilingual Matryoshka embeddings that can determine story similarity at varying levels of granularity based on which subset of the dimensions of the embeddings is examined. This embedding model achieves state-of-the-art performance on the SemEval 2022 Task 8 test dataset (Pearson $ρ$ = 0.816). Once trained, we develop an efficient hierarchical clustering algorithm that leverages the hierarchical nature of Matryoshka embeddings to identify unique news stories, narratives, and themes. We conclude by illustrating how our approach can identify and cluster stories, narratives, and overarching themes within real-world news datasets.
- Asia > North Korea (0.28)
- Europe > Ukraine (0.14)
- Asia > Russia (0.14)
- (14 more...)
- Research Report (1.00)
- Overview (0.68)
- Media > News (1.00)
- Information Technology (1.00)
- Government > Foreign Policy (0.93)
- (4 more...)
LLM for Comparative Narrative Analysis
Kampen, Leo, Villarreal, Carlos Rabat, Yu, Louis, Karmaker, Santu, Feng, Dongji
In this paper, we conducted a Multi-Perspective Comparative Narrative Analysis (CNA) on three prominent LLMs: GPT-3.5, PaLM2, and Llama2. We applied identical prompts and evaluated their outputs on specific tasks, ensuring an equitable and unbiased comparison between various LLMs. Our study revealed that the three LLMs generated divergent responses to the same prompt, indicating notable discrepancies in their ability to comprehend and analyze the given task. Human evaluation was used as the gold standard, evaluating four perspectives to analyze differences in LLM performance.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
Will A.I. Save the News?
I am a forty-five-year-old journalist who, for many years, didn't read the news. In high school, I knew about events like the O. J. Simpson trial and the Oklahoma City bombing, but not much else. In college, I was friends with geeky economics majors who read The Economist, but I'm pretty sure I never actually turned on CNN or bought a paper at the newsstand. I read novels, and magazines like Wired and Spin. If I went online, it wasn't to check the front page of the Times but to browse record reviews from College Music Journal. Somehow, during this time, I thought of myself as well informed.
- North America > United States > Oklahoma > Oklahoma County > Oklahoma City (0.24)
- North America > United States > New Jersey (0.14)
- Europe (0.14)
- Asia (0.14)
- Media > News (1.00)
- Government (1.00)
Fact-checking AI-generated news reports: Can LLMs catch their own lies?
Yao, Jiayi, Sun, Haibo, Xue, Nianwen
In this paper, we evaluate the ability of Large Language Models (LLMs) to assess the veracity of claims in ''news reports'' generated by themselves or other LLMs. Our goal is to determine whether LLMs can effectively fact-check their own content, using methods similar to those used to verify claims made by humans. Our findings indicate that LLMs are more effective at assessing claims in national or international news stories than in local news stories, better at evaluating static information than dynamic information, and better at verifying true claims compared to false ones. We hypothesize that this disparity arises because the former types of claims are better represented in the training data. Additionally, we find that incorporating retrieved results from a search engine in a Retrieval-Augmented Generation (RAG) setting significantly reduces the number of claims an LLM cannot assess. However, this approach also increases the occurrence of incorrect assessments, partly due to irrelevant or low-quality search results. This diagnostic study highlights the need for future research on fact-checking machine-generated reports to prioritize improving the precision and relevance of retrieved information to better support fact-checking efforts. Furthermore, claims about dynamic events and local news may require human-in-the-loop fact-checking systems to ensure accuracy and reliability.
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Massachusetts > Middlesex County > Watertown (0.04)
- (16 more...)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Basketball (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Leisure & Entertainment > Sports > Olympic Games (0.68)
Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories
Peterka, Tomas, Bohacek, Matyas
The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation, whether in deepfakes or text articles, often miss the interplay between multiple modalities. Built around a large language model, the system proposed in this paper addresses these challenges. It analyzes both the article's text and the provenance metadata of included images and videos to determine whether they are relevant. We open-source the system prototype and interactive web interface.
- South America > Ecuador (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England (0.04)
Real-time Fake News from Adversarial Feedback
Chen, Sanxing, Huang, Yukun, Dhingra, Bhuwan
We show that existing evaluations for fake news detection based on conventional sources, such as claims on fact-checking websites, result in high accuracies over time for LLM-based detectors -- even after their knowledge cutoffs. This suggests that recent popular fake news from such sources can be easily detected due to pre-training and retrieval corpus contamination or increasingly salient shallow patterns. Instead, we argue that a proper fake news detection dataset should test a model's ability to reason factually about the current world by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive fake news that challenges LLMs. Our iterative rewrite decreases the binary classification ROC-AUC by an absolute 17.5 percent for a strong RAG-based GPT-4o detector. Our experiments reveal the important role of RAG in both detecting and generating fake news, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG detection helps discover more deceitful patterns in fake news.
- North America > Haiti (0.28)
- North America > Mexico (0.14)
- Europe > Austria > Vienna (0.14)
- (24 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
Zhou, Yuhang, Zhu, Jing, Xu, Paiheng, Liu, Xiaoyu, Wang, Xiyao, Koutra, Danai, Ai, Wei, Huang, Furong
Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain examples, BalDistill achieves state-of-the-art performance across diverse long-tailed datasets, enhancing both the efficiency and efficacy of the distilled models.
- Research Report (1.00)
- Overview (0.68)
Philly sheriff slammed for losing guns, AI-generated news stories, thousands spent on mascot, DJs: Report
Tiffany Henyard, the embattled mayor of Dolton, Illinois, faced such an outcry of anger from town residents that many had to be kept outside the building. Much like Dolton, Illinois self-declared "Super Mayor" Tiffany Henyard, Philadelphia Sheriff Rochelle Bilal has been slammed with allegations of wild offenses ranging from spending department money on promotional items like trading cards with her likeness to having bogus news stories about her being generated by AI. While Bilal testified before the City Council last year that her department is underfunded to the point it "jeopardizes the lives and safety of our sworn and civilian personnel," her department's spending habits indicate that money may have been used in questionable ways, according to a new report from The Philadelphia Inquirer. The Philadelphia Sheriff's Office allegedly spent 9,250 on a new mascot, an African-American Wild Western-style female sheriff named Deputy Sheriff Justice, who debuted at the Thanksgiving Day parade, made by a company that makes some of the world's most recognizable mascot costumes, like that of the Geico gecko. Philadelphia Sheriff Rochelle Bilal speaks at a news conference, Philadelphia, Thursday, Sept. 21, 2023.
- Media > News (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government (1.00)
Inverting Grice's Maxims to Learn Rules from Natural Language Extractions
We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Missouri > Jackson County > Kansas City (0.05)
- North America > United States > Colorado (0.05)
- (7 more...)